Graphical modeling of binary data using the LASSO: a simulation study
نویسندگان
چکیده
BACKGROUND Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models. METHODS We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index.We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer. RESULTS Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms. CONCLUSIONS Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes.
منابع مشابه
The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data
The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...
متن کاملمطالعه تأثیر طرح تحول نظام سلامت بر شاخص های عملکردی بیمارستان های دانشگاه علوم پزشکی تهران: مطالعه موردی با استفاده از مدل پابن لاسو
Background and Aim: All hospitals need to be monitored and continuously evaluated. Pabon Lasso graphical model assesses the efficiency of hospitals using a combination of their input data and performance indicators. The aim of this study was to determine the effects of Iran Health System Evolution Plan on Tehran University of Medical Sciences (TUMS) hospitals’ performance indicators using the P...
متن کاملAssessing the efficiency of hospitals by using Pabon Lasso graphic model
Background: Efficiency assessment is one of the fundamental issues in hospitals. It is possible to evaluate and compare hospitals with measuring performance indicators. The aim of this study was to compare the performance of affiliated hospitals in Bushehr University of Medical Sciences from the viewpoint of performance indicators by using the graphical Pabon Lasso model. Materials and Methods...
متن کاملLearning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields
Markov random fields (MRFs) are a useful tool for modeling relationships present in large and high-dimensional data. Often, this data comes from various sources and can have diverse distributions, for example a combination of numerical, binary, and categorical variables. Here, we define the pairwise exponential Markov random field (PE-MRF), an approach capable of modeling exponential family dis...
متن کاملAn empirical comparative study of approximate methods for binary graphical models ; application to the search of associations among
Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. Following the ideas of the LASSO procedure designed for the linear regression framework,...
متن کامل